A Comparison of Tagging Strategies for Statistical Information Extraction
نویسنده
چکیده
There are several approaches that model information extraction as a token classification task, using various tagging strategies to combine multiple tokens. We describe the tagging strategies that can be found in the literature and evaluate their relative performances. We also introduce a new strategy, called Begin/After tagging or BIA, and show that it is competitive to the best other strategies.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملMatrix : a statistical method and software tool for linguistic analysis through corpus comparison
Matrix: A statistical method and software tool for linguistic analysis through corpus comparison A thesis submitted to Lancaster University for the degree of Ph.D. in Computer Science Paul Edward Rayson, B.Sc. September 2002 This thesis reports the development of a new kind of method and tool (Matrix) for advancing the statistical analysis of electronic corpora of linguistic data. First, we des...
متن کاملOptimal Feature Extraction for Discriminating Raman Spectra of Different Skin Samples using Statistical Methods and Genetic Algorithm
Introduction: Raman spectroscopy, that is a spectroscopic technique based on inelastic scattering of monochromatic light, can provide valuable information about molecular vibrations, so using this technique we can study molecular changes in a sample. Material and Methods: In this research, 153 Raman spectra obtained from normal and dried skin samples. Baseline and electrical noise were eliminat...
متن کاملName Tagging Using Lexical, Contextual, and Morphological Information
Abstract This paper presents a probabilistic model for automatically tagging names in a Turkish text. We used four different information sources to model names, and successfully combined them. Our first information source is based on the surface forms of the words. Then we combined the contextual cues with the lexical model, and obtained a significant improvement. After this, we modeled the mor...
متن کاملClassification Structuring Tagging
This paper presents an information extraction system that processes the textual content of classiied newspaper advertisements in French. The system uses both lexical (words, regular expressions) and contextual information to structure the content of the ads on the basis of predeened thematic forms. The paper rst describes the enhanced tagging mechanism used for extraction. A quantitative evalua...
متن کامل